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Abstract. We give another proof of a theorem of Fife — understood 
broadly as providing a finite automaton that gives a complete descrip- 
tion of all infinite binary overlap-free words. Our proof is significantly 
simpler than those in the literature. As an application we give a complete 
characterization of the overlap-free words that are 2-automatic. 



1 Introduction 

Repetitions in words is a well-researched topic. Among the various themes stud- 
ied, the binary overlap-free words play an important role, both historically and 
as an example exhibiting interesting structure. Here by an overlap we mean a 
word of the form axaxa, where a is a single letter and a; is a (possibly empty) 
word. 

It is easy to see that neither the finite nor the infinite binary overlap-free 
words form a regular language. Nevertheless, in 1980, Earl Fife [8 proved a 
theorem characterizing the infinite binary overlap-free words as encodings of 
paths in a finite automaton. His theorem was rather complicated to state and 
the proof was difficult. Berstel [3] later simplified the exposition, and both Carpi 
[5] and Cassaigne [7] gave an analogous analysis for the case of finite words. Also 
see g]. 

In this note we show how to use the factorization theorem of Restivo and 
Salemi [11] to give an alternate (and, we hope, significantly simpler) proof of 
Fife's theorem — here understood in the general sense of providing a finite 
automaton whose paths encode all infinite binary overlap-free words. 

As a consequence we are able to disprove a conjecture on the fragility of 
overlap- free words. 



2 Notation 



Let E be a finite alphabet. We let E* denote the set of all finite words over E 
and E u denote the set of all (right-) infinite words over E. We say y is a factor 
of a word w if there exist words x, z such that w = xyz. 

If a; is a finite word, then x u represents the infinite word xxx 
As mentioned above, an overlap is a word of the form axaxa, where a££ 
and x G E* . An example of an overlap in English is the word alfalfa. A finite 
or infinite word is overlap-free if it contains no finite factor that is an overlap. 



From now on we fix S = {0, 1}. The most famous infinite binary overlap-free 
word is t, the Thue-Morse word, defined as the fixed point, starting with 0, of 
the Thue-Morse morphism fj,, which maps to Of and 1 to 10. We have 

t = i M 2 • • • = 0110100110010110 

The morphism fi has a second fixed point, t = which is obtained from t 

by applying the complementation coding defined by = 1 and 1 = 0. 

We let O denote the set of (right-) infinite binary overlap-free words. 

We now recall the infinite version of the factorization theorem of Restivo and 
Salemi [TT] as stated in |TJ Lemma 3]. 

Theorem 1. Let x £ O, and let P = {pojP1jP2,P3,P4j-, where po = e, p\ = 0, 
P2 = 00, P3 — 1, and p± = 11. Then there exists y G O and p G P such 
that x = p/x(y). Furthermore, this factorization is unique, and p is uniquely 
determined by inspecting the first 5 letters of x. 

We can now iterate the factorization theorem to get 

Corollary 1. Every infinite overlap-free word x can be written uniquely in the 
form 

x=p il /i(p i2 ju(p i3 ^(---))) (f) 

with ij G {0, 1,2,3, 4} for j > 1, subject to the understanding that if there exists 
c such that ij — for j > c, then we also need to specify whether the "tail" of 
the expansion represents /Lt w (0) = t or A* w (l) = t. Furthermore, every truncated 
expansion 

is a prefix ofx, with the understanding that if i n — 0, then we need to replace 
with either 1 (if the "tail" represents t) or 3 (if the "tail" represents t). 

Proof. The form ([1]) is unique, since each pi is uniquely determined by the first 
5 characters of the associated word. 

Thus, we can associate each infinite binary overlap-free word x with the 
essentially unique infinite sequence of indices i := (ij)j>o coding elements in P, 
as specified by (JTJ) . If i ends in W , then we need an additional element (either 1 or 
3) to disambiguate between t and t as the "tail" . In our notation, we separate this 
additional element with a semicolon so that, for example, the string 000 • ■ • ; 1 
represents t and 000 • ■ • ; 3 represents t. 

Other sequences of interest include 203000 • • • ; 1, which codes OOlOOlt, the 
lexicographically least infinite word, and 2(31)^, which codes the word having, 
in the z'th position, the number of 0's in the binary expansion of i. 

Of course, not every possible sequence of (ij)j>i of indices corresponds to 
an infinite overlap-free word. For example, every infinite word coded by 21 • • • 
represents 00fi(0p(. . .)) and hence begins with 000 and has an overlap. Our goal 
is to characterize precisely, using a finite automaton, those infinite sequences 
corresponding to overlap-free words. 

We recall some basic facts about overlap-free words. 



Lemma 1. Let a e Z. Then 

(a) x e ^ /i(x) G O; 

(b) a/i(x) G ax G 0/ 

(c) aa /i(x) G ax G and x begins a a a. 



Proof. See, for example, [T]. 
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Next, we describe the relationships between these classes: 



Lemma 2. Let x be an infinite binary word. Then 

x e A -<=>• /it(x) e A 

x e B 0/i(x) g A 

xeC^ OO^(x) g A 

x G D <^=^ l/i(x) G A 

xe£ HmW g A 

xE D ^==^> //(x) G B 

x E B 0/i(x) G -B 

xefi l/i(x) G B 

xE B ^==^> jti(x) G -D 

xefl i/i(x) g D 

X6C^ 0/i(x) G L> 

xE I ^=^> //(x) 6 -E 

xeC^ 0/i(x) G E 

x g F <^=> /x(x) e G 

x e E l/i(x) G G 

x G J 0/i(x) G J 

xeG^ l/i(x) G F 

x e K /«(x) E J 

x E J /«(x) E K 

x E B 0/x(x) G J 

xeC^ 0//(x) G if 

x e _ff /i(x) g G 

x E G /i(x) G i? 

x E D ^> l/i(x) G G 

x E E l/i(x) G i? 



Proof. 

© : Follows immediately from Lemma [1] (a) . 
©, ©, 0, ([TO]): Follow immediately from Lemma [T](b). 
©, ©, ©, ©: Follow immediately from Lemma Q](c). 
®: 0/x(x) £B ^ 10£t(x) = /Lt(lx) G O lx G C 

(fTTjl : Just like ©. 



(P|: n(x) e£ « (0/x(x) e O and /x(x) begins with 010) ^> (lx e O 
and x begins with 00). 

(H1J): Just like CE3J). 

([H]): 0/x(x) £ £ (00/z(x) € O and 0^(x) begins with 010) (lx g 

and x begins with 101). 

(P|: Just like (PI). 

(fTTj) : 0/z(x) 6 / ^ (10/i(x) g and 0/i(x) begins with 00) (/i(lx) € 

and x begins with 0) (lx g and x begins with 0). 

(PI): Just like CEZ]). 

(pf : /i(x) g J (l^( x ) G C and M( x ) begins with 0) (Ox g and 

x begins with 0). 

(EH), (HOD, USD: Just like CEU). 

(|2"T1) : 0^(x) g J (10^(x) g and 0ju(x) begins with 0) <^ /i(lx) g 

O ^ lxeO. 

(pi): Just like (EU). 

(P|) : 0/i(x) g X (00/x(x) g and 0/z(x) begins with 0) (lx g 

and x begins with 101). 

(Pf : Just like (l2"2l . 

We can now use the result of the previous lemma to create an 11-state au- 
tomaton that accepts all infinite sequences (ij)j>i over A := {0, 1,2,3,4} such 
that pi 1 fi(pi 2 fi(pi 3 /i(- ■ ■ ))) is overlap-free. Each state represents one of the sets 
A, B, . , . , K defined above, and the transitions are given by Lemma [U 

Of course, we also need to verify that transitions not shown correspond to 
the empty set of infinite words. For example, a transition out of B on the symbol 
2 would correspond to the set {x : 100/i(x) g 0}. But if x begins with 0, then 
100/x(x) = 10001 • • • contains the overlap 000 as a factor, whereas if x begins 
with 10, then 100/i(x) = 1001001 • • • contains the overlap 1001001 as a factor, 
and if x begins with 11, then 100/i(x) = 1001010 ■ • • contains 01010 as a factor. 
Similarly, we can (somewhat tediously) verify that all other transitions not given 
in Figure [T] correspond to the empty set: 
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The proof of most of these is immediate. (We have not listed 5(1, a) for 
a G {0, 2, 3, 4}, nor S(G, a) for a G {1, 2, 4}, nor 5(H, a) for a G {1, 2, 4}, as these 
are symmetric with other cases.) The only one that requires some thought is 



- If x begins 00, then 011/z(x) = 0110101 • • • , which has 10101 as a factor. 

- If x begins 01, then 011/i(x) = 0110110 • • • , which has 0110110 as a factor. 

- If x begins 1, then 011/x(x) = OHIO • • • , which has 111 as a factor. 
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Fig. 1. Automaton coding infinite binary overlap-free words 

From Lemma [5] and the results above, we get 

Theorem 2. Every infinite binary overlap-free word x is encoded by an infinite 
path, starting in A, through the automaton in Figure\j\ 

Every infinite path through the automaton not ending in 0" codes a unique 
infinite binary overlap-free word x. // a path i ends in 0" and this suffix corre- 
sponds to a cycle on state A or a cycle between states B and D, then x is coded 
by either i; 1 or i; 3. If a path i ends in 0" and this suffix corresponds to a cycle 
between states J and K, then x is coded by i; 1. If a path i ends in 0" and this 
suffix corresponds to a cycle between states G and H, then x is coded by i;3. 

Corollary 2. Each of the 11 sets A,B,...,K is uncountable. 

Proof. We prove this for K, with the proof for the other sets being similar. 
Elements in the set K correspond to those infinite paths leaving the state K 
in Figure [TJ It therefore suffices to produce uncountably many distinct paths 
leaving K. One way to do this, for example, is by {13010, 1301000}". 

3 The lexicographically least overlap-free word 

We now recover a theorem of pQ: 



Theorem 3. The lexicographically least infinite binary overlap-free word is OOlOOlt. 

Proof. Let x be the lexicographically least infinite word, and let y be its code. 
Then y[l] must be 2, since any other choice codes a word that starts with 01 
or something lexicographically greater. Once y[l] = 2 is chosen, the next two 
symbols must be y[2..3] = 03. Now we are in state G. We argue that the lexico- 
graphically least string that follows causes us to alternate between states G and 
H on 0, producing 100 • • • . For otherwise our only choices are 30, 31, or (if we are 
in G) 33 as the next two symbols, and all of these code a word lexicographically 
greater than 100. Hence y = 203 0"; 1 is the code for the lexicographically least 
sequence, and this codes OOlOOlt. 

4 Automatic infinite binary overlap-free words 

As a consequence of Theorem [2j we can give a complete description of the in- 
finite binary overlap- free words that are 2-automatic [2i . Recall that an infinite 
word (a„) n >o is fc-automatic if there exists a deterministic finite automaton with 
output that, on input n expressed in base k, produces an output associated with 
the state last visited that is equal to a n . 

Theorem 4. An infinite binary overlap-free word is 2-automatic if and only if 
its code is both specified by the DFA given above in Figure 1, and is ultimately 
periodic. 

First, we need two lemmas: 

Lemma 3. An infinite binary word x = a$aia2 • • • is 2-automatic if and only 
if ju(x) is 2-automatic. 

Proof. For one direction, we use the fact that the class of fc-automatic sequences 
is closed under uniform morphisms ( 2, Theorem 6.8.3]). So if x is 2-automatic, 
so is ju(x). 

For the other, we use the well-known characterization of automatic sequences 
in terms of the fc-kernel [2J Theorem 6.6.2]: a sequence (c n ) n >o is A;-automatic if 
and only if its fc-kernel defined by 

{(c k 'n+i)n>o : e > and < i < k e } 

is finite. Furthermore, each sequence in the /c-kernel is fc-automatic. 

Now if y = jLt(x) = &o^i^2 ■ • ■ , then bm = o-n- So one of the sequences in the 
2-kernel of y is x, and if y is 2-automatic, then so is x. 

Now we can prove Theorem 2] 

Proof. Suppose the code of x is ultimately periodic. Then we can write its code 
as yz^ for some finite words y and z. Since the class of 2-automatic sequences 
is closed under appending a finite prefix (2] Corollary 6.8.5], by Lemma [3l it 
suffices to show that the word coded by z u is 2-automatic. 



The word z u codes an overlap- free word w satisfying w = t<p(w), where t is 
a finite word and ip is a power of /i. If t is empty the result is clear. Otherwise, 
by iteration, we get that 

w = tip(t)ip 2 (t) ■■■ . (27) 

The 2-kernel of a sequence is obtained by repeated 2 -decimation, that is, 
recursively splitting a sequence into its even- and odd-indexed terms. When we 
apply 2-decimation to fi k (t), where t is a finite word, we get ^t k ^ 1 (t) and fi ( t ). 
These words are both of even length, provided k is at least 1. Hence iteratively 
applying 2-decimation to w, as given in (|27p. shows that if ip = fi k , then the 
2-kernel of w is contained in 

S := {u^{v)^ i+k {v)ii l+2k {v) ■■■ : |u| < 2\t\ and v e {t,t} and 1 < i < k}, 

which is a finite set. 

On the other hand, suppose the code for x is not ultimately periodic. Then 
we show that the 2-kernel is infinite. To see this, note that the code for x contains 
a 2 or 4 only at the beginning, so we can assume without loss of generality that 
the code for x contains only the letters 0, 1,3. Now it is easy to see that if the 
code for x is ay for some letter a G {0, 1, 3} and infinite string y S {0, 1, 3} w , 
then one of the sequences in the 2-kernel (obtained by taking either the odd- 
or even-indexed terms) is either coded by y or its complement is coded by y. 
Since the code for x is not ultimately periodic, there are infinitely many distinct 
sequences in the orbit of the code for x, under the shift. (By the orbit of y we 
mean the set of sequences of the form y[i..oo] for i > 1.) Now infinitely many 
of these sequences correspond to a sequence in the 2-kcrnel, or its complement. 
Hence x is not 2-automatic. 

5 A fragility conjecture disproved 

Brown, Rampersad, Shallit, and Vasiga showed that the Thue-Morse word t is 
fragile in the following sense: if any finite nonempty set of positions is chosen, 
and the bits in those positions are simultaneously flipped to the complement of 
their original values, the result has an overlap [5]. 

It is natural to wonder if a similar result holds more generally for all overlap- 
free words. However, the statement must be modified in this more general setting, 
as (for example) both Ot and It are overlap-free. 

The author made the following conjecture at the Oberwolfach meeting in 
2010: 

Conjecture 1. For each infinite binary overlap- free word w there exists a constant 
C (depending on w) such that if the bits at any finite nonempty set of positions 
> C are flipped, then the result has an overlap. 

Using our result we can disprove this conjecture. For consider the infinite 
words coded by 1{113011, 313011}". By examining the automaton, each such 
word is easily seen to be a valid code for an overlap-free word. These words have 



blocks that line up exactly at the same positions, but each 6th block can be 
replaced by the appropriate power of fj, evaluated at either or 1, and each such 
choice gives a distinct overlap-free word. 

6 Remarks 

According to a theorem of Karhumaki and the author [5], there is a similar 
factorization theorem for all exponents a with 2 < a < |. Recently we have 
proven similar results for a — | |10) . 

I am grateful to the referees for a careful reading of the manuscript. 
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